System and method for interacting with a display

ABSTRACT

A system and method of interacting with a display. The method comprises recognizing a disturbance in a display zone of a projected image and displaying a selected state in response to the recognized disturbance. The method further includes recognizing a gesture which interrupts a light source and is associated with an action to be taken on or associated with the displayed selected state. An action is executed in response to the recognized gesture. The system includes a server having a database containing data associated with at least one or more predefined gestures, and at least one of a hardware and software component for executing an action based on the at least one or more predefined gestures.

CROSS REFERENCE TO RELATED APPLICATION

The present application is a continuation of U.S. application Ser. No.11/552,811, filed Oct. 25, 2006, the contents of which are incorporatedby reference herein in their entirety.

FIELD OF THE INVENTION

The invention generally relates to a system and method for interactingwith a projected display and, more particularly, to a system and methodfor interacting with a projected display utilizing gestures capable ofexecuting menu driven commands and other complex command structures.

BACKGROUND OF THE INVENTION

Businesses strive for efficiencies throughout their organization. Theseefficiencies result in increased productivity of their employees which,in turn, results in increased profitability for the business and, ifpublicly traded, its shareholders. To achieve such efficiencies, by wayof examples, it is not uncommon to hold meetings or make presentationsto audiences to discuss new strategies, advances in the industry and newtechnologies, etc.

In such meetings, presentation boards or so-called “whiteboards” are oneway to present material relevant to the presentation or meeting. As iswell known, a whiteboard allows a presenter to write using special “dryerase” markers. When the text is no longer needed such material may beerased so that the user can continue with the presentation, for example.But unfortunately, often the text needs to be saved in order to referback to the material or place new material in the proper context. Inthese situations, an attendee may save the material by manually copyingthe text in a notebook before the image is erased by the presenter. Aproblem with this approach is that it is both time consuming and errorprone. Also, the use of whiteboards is limited because it is difficultto draw charts or other graphical images and it is not possible tomanipulate data.

In another approach, it is not uncommon to use large scrolls or tear offpieces of paper to make the presentation. By using this approach, thepresenter merely removes the paper from the pad (or rolls the paper) andthen continues with the next sheet. This approach, though, can becumbersome and although it allows the presenter to refer back to pastwritings, it is not very efficient. Additionally, this can result inmany different sheets or very large scrolls of one sheet which canbecome confusing to the audience and, even, the presenter. Also, as withthe above approach, it is difficult to draw charts or other graphicalimages, and it is not possible to manipulate data.

In a more technology efficient approach, the presenter can presentcharts or other graphical images to an audience by optically projectingthese images onto a projection screen or a wall. In known applications,an LCD (liquid crystal display) projector is commonly used as the imagesource, where the charts, text, or other graphical images areelectronically generated by a display computer, such as a personalcomputer (PC) or a laptop computer. In such display systems, the PCprovides video outputs, but interaction with the output is limited, atbest.

Also, whether the presenter is standing at a lectern, or is moving aboutbefore the audience, there is little direct control over the image beingdisplayed upon the projection screen when using a conventional LCD/PCprojection display system. For example, a conventional system requiresthe presenter to return to the display computer so as to provide controlfor the presentation. At the display computer, the presenter controlsthe displayed image by means of keystrokes or by “mouse commands” with acursor in the appropriate area of the computer monitor display screen.

In some applications, an operator may use a remote control device towirelessly transmit control signals to a projector sensor. Although thepresenter acquires some mobility by means of the remote control device,the presenter still cannot interact with the data on the screen itself;that is, the operator is limited to either advancing or reversing thescreen.

Accordingly, there exists a need in the art to overcome the deficienciesand limitations described hereinabove.

SUMMARY OF THE INVENTION

In a first aspect of the invention, a method comprises recognizing adisturbance in a display zone of a projected image and displaying aselected state in response to the recognized disturbance. The methodfurther includes recognizing a gesture which interrupts a light sourceand is associated with an action to be taken on or associated with thedisplayed selected state. An action is executed in response to therecognized gesture.

In another aspect of the invention, the method comprises projecting animage on a surface using at least a source of light and a processorconfigured to store and execute application programs associated with theimage. The method senses a first action in a display zone of the imageand validates the first action. The method displays a selected state inresponse to the validated first action. The method further senses agesture interrupting the light source and validates that the gesture isassociated with a pre-defined command and the displayed selected state.The method executes the pre-defined command in response to the validatedgesture.

In another aspect of the invention, a system comprises a server having adatabase containing data associated with at least one or more predefinedgestures, and at least one of a hardware and software component forexecuting an action based on the at least one or more predefinedgestures. The hardware and software compares a first action in aninteraction zone to a predefined template of a shape, and a secondaction, which interrupts a light source, to the at least one or morepredefined gestures. The system validates the first action and thesecond action based on the comparison to the predefined template and theat least one or more predefined gestures. The system executes the actionbased on the validating of the first action and the second action.

In yet another aspect of the invention, a computer program productcomprising a computer usable medium having readable program codeembodied in the medium includes at least one component to perform thesteps of the invention, as disclosed and recited herein.

In still another embodiment, a method comprises recognizing a firstaction of a first object and a second action of a second. The methodfurther includes validating a movement comprising a combination of thefirst action and the second action by comparison to predefined gesturesand executing a complex command based on the validating of thecombination of the first action and the second action.

In a further aspect of the invention, a method for deploying anapplication for web searching which comprises providing a computerinfrastructure. The computer infrastructure is operable to: project animage on a surface; sense a first action in a predefined interactionzone of the image; validate the first action and displaying a selectedstate; sense a gesture; validate that the gesture is associated with apre-defined action; and execute the pre-defined action in response tothe validated gesture.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an illustrative environment for implementing the steps inaccordance with the invention;

FIG. 2 shows an embodiment of a system in accordance with the invention;

FIG. 3 is a representation of a range of motion of the system in arepresentative environment in accordance with an embodiment of theinvention;

FIG. 4 represents a method to correct for distortion of a projectedimage on a surface or object;

FIG. 5 shows a system architecture according to an embodiment of theinvention;

FIGS. 6 a and 6 b show a representative look-up table according to anembodiment of the invention;

FIG. 7 shows an illustrative template used in accordance with anembodiment of the invention; and

FIG. 8 is a representation of a swim lane diagram implementing stepsaccording to an embodiment of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

The invention is directed to a system and method for interacting with aprojected display and more specifically to a system and method forinteracting with a projected display utilizing gestures capable ofexecuting menu driven commands and other complex command structures. Thesystem and method can be implemented using a single computer, over anydistributed network or stand-alone server, for example. In embodiments,the system and method is configured to be used as an interactive touchscreen projected onto any surface, and which allows the user to performand/or execute any command on the interactive touch screen surfacewithout the need for a peripheral device such as, for example, a mouseor keyboard. Accordingly, the system and method is configured to providedevice-free, non-tethered interaction with a display projected on anynumber of different surfaces, objects and/or areas in an environment.

The system and method of the invention projects displays on differentsurfaces such as, for example, walls, desks, presentation boards and thelike. In implementations, the system and method allows complex commandsto be executed such as, for example, opening a new file using a dragdown menu, or operations such as cutting, copying, pasting or othercommands that require more than a single command step. It should beunderstood, though, that the system and method may also implement andexecute single step commands.

In embodiments, the commands are executed using gestures, which arecaptured, reconciled and executed by a computer. The actions to beexecuted, in one implementation, require two distinct actions by theuser as implemented by a user's hands, pointers of some kind or anycombination thereof. Thus, the system and method of the invention doesnot require any special devices to execute the requested commands and,accordingly, is capable of sensing and supporting forms of interactionsuch as hand gestures and/or motion of objects, etc. to perform suchcomplex operations.

In embodiments, the system and method can be implemented using, forexample, the Everywhere Display™, manufactured and sold by InternationalBusiness Machines Corp. (Everywhere Display™ and IBM are trademarks ofIBM Corp. in the United States, other countries, or both.) By way ofexample, the Everywhere Display can provide computer access in publicspaces, facilitate navigation in buildings, localize resources in aphysical space, bring computational resources to different areas of anenvironment, and facilitate the reconfiguration of the workplace.

FIG. 1 shows an illustrative environment 10 for managing the processesin accordance with the invention. To this extent, the environment 10includes a computer infrastructure 12 that can perform the processesdescribed herein. In particular, the computer infrastructure 12 includesa computing device 14 that comprises a management system 30, which makescomputing device 14 operable to perform complex commands using gesturesin accordance with the invention, e.g., process described herein. Thecomputing device 14 includes a processor 20, a memory 22A, aninput/output (I/O) interface 24, and a bus 26. Further, the computingdevice 14 is in communication with an external I/O device/resource 28and a storage system 22B.

In general, the processor 20 executes computer program code, which isstored in memory 22A and/or storage system 22B. While executing computerprogram code, the processor 20 can read and/or write data from look-uptables which are the basis for the execution of the commands to beperformed on the computer, to/from memory 22A, storage system 22B,and/or I/O interface 24. The bus 26 provides a communications linkbetween each of the components in the computing device 14. The I/Odevice 28 can comprise any device that enables an individual to interactwith the computing device 14 or any device that enables the computingdevice 14 to communicate with one or more other computing devices usingany type of communications link.

The computing device 14 can comprise any general purpose computingarticle of manufacture capable of executing computer program codeinstalled thereon (e.g., a personal computer, server, handheld device,etc.). However, it is understood that the computing device 14 is onlyrepresentative of various possible equivalent computing devices that mayperform the processes described herein. To this extent, in embodiments,the functionality provided by computing device 14 can be implemented bya computing article of manufacture that includes any combination ofgeneral and/or specific purpose hardware and/or computer program code.In each embodiment, the program code and hardware can be created usingstandard programming and engineering techniques, respectively.

Similarly, the computer infrastructure 12 is only illustrative ofvarious types of computer infrastructures for implementing theinvention. For example, in embodiments, the computer infrastructure 12comprises two or more computing devices (e.g., a server cluster) thatcommunicate over any type of communications link, such as a network, ashared memory, or the like, to perform the process described herein.Further, while performing the process described herein, one or morecomputing devices in the computer infrastructure 12 can communicate withone or more other computing devices external to computer infrastructure12 using any type of communications link. The communications link cancomprise any combination of wired and/or wireless links; any combinationof one or more types of networks (e.g., the Internet, a wide areanetwork, a local area network, a virtual private network, etc.); and/orutilize any combination of transmission techniques and protocols. Asdiscussed herein, the management system 30 enables the computerinfrastructure 12 to recognize gestures and execute associated commands.

In embodiments, the invention provides a business method that performsthe process steps of the invention on a subscription, advertising,and/or fee basis. That is, a service provider, such as a SolutionIntegrator, could offer to perform the processes described herein. Inthis case, the service provider can create, maintain, and support, etc.,a computer infrastructure that performs the process steps of theinvention for one or more customers. In return, the service provider canreceive payment from the customer(s) under a subscription and/or feeagreement and/or the service provider can receive payment from the saleof advertising content to one or more third parties.

FIG. 2 shows an embodiment of the system of the invention. As shown inFIG. 2, the system is generally depicted as reference numeral 100 andcomprises a projector 110 (e.g., LCD projector) and acomputer-controlled pan/tilt mirror 120. The projector 110 is connectedto the display output of a computer 130, which also controls the mirror120. In one non-limiting illustrative example, the light of theprojector 110 can be directed in any direction within the range ofapproximately 60 degrees in the vertical axis and 230 degrees in thehorizontal axis. Those of skill in the art should understand that otherranges are contemplated by the invention such as, for example, a rangeof 360 degrees in the horizontal and/or vertical axis. In embodiments,using the above ranges, the system 100 is capable of projecting agraphical display on most parts of all walls and almost all of the flooror other areas of a room. In embodiments, the projector 110 is a 1200lumens LCD projector.

Still referring to FIG. 2, a camera 140 is also connected to thecomputer 130 and is configured to capture gestures or motions of theuser and provide such gestures or motions to the computer 130 forreconciliation and execution of commands (as discussed in greater detailbelow). The camera 140 is preferably a CCD based camera which isconfigured and located to capture motions and the like of the user. Thecamera 140 and other devices may be connected to the computer via anyknown networking system as discussed above.

FIG. 3 is a representation of a range of motion of the system in arepresentative environment according to an embodiment of the invention.As shown in FIG. 3, the system 100 of the invention is configured toproject a graphical display on walls, the floor, and a table, forexample. Of course, depending on the range of the projector, the system100 is capable of projecting images on most any surface within anenvironment thus transforming most any surface into an interactivedisplay.

FIG. 4 represents a graphical methodology to correct for distortion ofthe projected image caused by oblique projection and by the shape of theprojected surface. To make such correction, the image to be projected isinversely distorted prior to projection on the desired surface using,for example, standard computer graphics hardware to speed up the processof distortion control. By way of illustrative example, one methodologyrelies on the camera 140 and projector 110 having the same focal length.Therefore, to project an image obliquely without distortions it issufficient to simulate the inverse process (i.e., viewing with a camera)in a virtual 3D-computer graphics world.

More specifically, as shown in FIG. 4, the system and method of theinvention texture-maps the image to be displayed onto a virtual computergraphics 3D surface “VS” identical (minus a scale factor) to the actualsurface “AS”. The view from the 3D virtual camera 140 should correspondexactly or substantially exactly to the view of the projector (if theprojector was the camera) when:

-   -   the position and attitude of the surface in the 3D virtual space        in relation to the 3D virtual camera is identical (minus a scale        factor) to the relation between the real surface and the        projector, and    -   the virtual camera has identical or substantially identical        focal length to the projector.

In embodiments, a standard computer graphics board may be used to renderthe camera's view of the virtual surface and send the computed view tothe projector 110. If the position and attitude of the virtual surface“VS” are correct, the projection of this view compensates the distortioncaused by oblique projection or by the shape of the surface. Of course,an appropriate virtual 3D surface can be uniquely used and calibratedfor each surface where images are projected. In embodiments, thecalibration parameters of the virtual 3D surface may be determinedmanually by projecting a special pattern and interactively adjusting thescale, rotation and position of the virtual surface in the 3D world, andthe “lens angle” of the 3D virtual camera.

FIG. 5 shows a current system architecture according to an embodiment ofthe invention. In embodiments, the system architecture includes athree-tier architecture comprising a services layer 300, an integrationlayer 310 and an application layer 320. In embodiments, each of themodules 300 a-300 f in the services layer 300 exposes a set ofcapabilities through a http/XML application programming interface (API).In embodiments, modules in the services layer 300 have no “direct”knowledge or dependence on other modules in the layer; however, themodules 300 a-300 f may share a common XML language along with a dialectfor communication with each module in the services layer 300.

In embodiments, the services layer 300 includes six modules 300 a-300 f.For example, a vision interface module (vi) 300 a may be responsible forrecognizing gestures and converting this information to the application(e.g., program being manipulated by the gestures). A projection module(pj) 300 b may handle the display of visual information (via theprojector) on a specified surface while a camera module (sc) 300 cprovides the video input (via the camera) from the surface of interestto the vision interface (vi) 300 a. The camera, as discussed above, willsend the gestures and other motions of the user. Interaction with theinterface by the user comprises orchestrating the vision interface 300a, projection module 300 b and camera module 300 c through a sequence ofsynchronous and asynchronous commands, which are capable of beingimplemented by those of skill in the art. Other modules present in theservices layer 300 include a 3D environment modeling module 300 d, auser localization module 300 e, and a geometric reasoning module 300 f.

The 3D environment modeling module 300 d can be a version of standard 3Dmodeling software. The 3D environment modeling module 300 d can supportbasic geometric objects built out of planar surfaces and cubes andallows importing of more complex models. In embodiments, the 3Denvironment modeling module 300 d stores the model in XML format, withobjects as tags and annotations as attributes. The 3D environmentmodeling module 300 d is also designed to be accessible to the geometricreasoning module 300 f, as discussed below.

The geometric reasoning module 300 f is a geometric reasoning enginethat operates on a model created by a modeling toolkit which, inembodiments, is a version of standard 3D modeling software. Thegeometric reasoning module 300 f enables automatic selection of theappropriate display and interaction zones (hotspots) based on criteriasuch as proximity of the zone to the user and non-occlusion of the zoneby the user or by other objects. In this manner, gestures can be used tomanipulate and execute program commands and/or actions. Applications orother modules can query the geometric reasoning module 300F through adefined XML interface.

In embodiments, the geometric reasoning module 300 f receives a userposition and a set of criteria, specified as desired ranges of displayzone properties, and returns all display zones which satisfy thespecified criteria. The geometric reasoning module 300 f may also have alook-up table or access thereto for determining gestures of a user,which may be used to implement the actions or commands associated with acertain application. The properties for a display zone may include,amongst other properties, the following:

-   -   1) Physical size of the display zone in some specified units        such as inches or centimeters.    -   2) Absolute orientation defined as the angle between the surface        normal of the display zone and a horizontal plane.    -   3) User proximity defined as the distance between the center of        the user's head and the center of a display zone.    -   4) Position of the user relative to the display zone, defined as        the two angles to the user's head in a local spherical        coordinate system attached to the display zone. This indicates,        for example, whether the user is to the left or to the right of        a display zone.    -   5) Position of the display zone relative to the user, defined as        the two angles to the display zone in a local spherical        coordinate system attached to the user's head.    -   6) Occlusion percentage, which is defined as the percentage of        the total area of the display zone that is occluded with respect        to a specified projector position and orientation.    -   7) An occlusion mask, which is a bitmap that indicates the parts        of a display zone occluded by other objects in the model or by        the user.

The user localization module 300 e is, in embodiments, a real-timecamera-based tracking to determine the position of the user in theenvironment, as well as, in embodiments, gestures of the user. Inembodiments, the user localization module 300 e can be configured totrack the user's motion to, for example, move the display to the useror, in further embodiments, recognize gestures of the user forimplementing actions or commands.

In embodiments, the tracking technique is based on motion, shape, and/orflesh-tone cues. In embodiments, a differencing operation on consecutiveframes of the incoming video can be performed. A morphological closingoperation then removes noise and fills up small gaps in the detectedmotion regions. A standard contour- tracing algorithm then yields thebounding contours of the segmented regions. The contours are smoothedand the orientation and curvature along the contour is computed. Theshape is analyze for each contour to check if it could be a head orother body part or object of interest, which is tracked by the systemand method of the invention.

In the example of a head, the system looks for curvature changescorresponding to a head-neck silhouette (e.g., concavities at the neckpoints and convexity at the top of the head). In embodiments, sufficientflesh-tone color within the detected head region is detected by matchingthe color of each pixel within the head contour with a model of fleshtone colors in normalized r-g space. This technique detects multipleheads in real time. In embodiments, multiple cameras with overlappingviews to triangulate and estimate the 3D position of the user arepossible. This same technique can be used to recognize gestures in orderfor the user to interact with the display, e.g., provide complexcommands.

In embodiments, the integration layer 310 provides a set of classes thatenable a JAVA application to interact with the services. (Java and allJava-based trademarks are trademarks of Sun Microsystems, Inc. in theUnited States, other countries, or both.) The integration layer 310, inembodiments, contains a set of JAVA wrapper objects for all objects andcommands, along with classes enabling synchronous and asynchronouscommunication with modules in the services layer 300. The integrationlayer 310, in embodiments, mediates the interaction among the serviceslayer modules 300 a-300 f. For example, through a single instruction tothe interaction manager 310 a, a JAVA application can start aninteraction that sends commands to the vision interface, the projectionmodule and the mirror defining, instantiating, activating, and managinga complex interactive display interaction. Similarly, the integrationlayer 310, for example, can coordinate the geometric reasoning moduleand the 3D environment modeler in a manner that returns the current userposition along with all occluded surfaces to the application at aspecified interval.

In embodiments, the application layer 320 comprises a set of classes andtools for defining and running JAVA applications and a repository ofreusable interactions. In embodiments, each interaction is a reusableclass that is available to any application. An application class, forexample, is a container for composing multiple interactions, maintainingapplication state during execution, and controlling the sequence ofinteractions through the help of a sequence manager 320 a. Other toolsmay also be implemented such as, for example, a calibrator tool thatallows a developer to calibrate the vision interface module 300 a, theprojection module 300 b and the camera module 300 c for a particularapplication.

In embodiments, the user interacts with the projected display by usinghand gestures over the projected surface, as if the hands, for example,were a computer mouse. Techniques described above, such as, for example,using the geometric reasoning module 300 f or the user localizationmodule 300 e can be implemented to recognize such gesturing. By way ofnon-limiting illustration, the geometric reasoning module 300 f may usean occlusion mask, which indicates the parts of a display zone occludedby objects such as, for example, hand gestures of the user.

More specifically, in embodiments, the camera may perform three basicsteps: (i) detecting when the user is pointing; (ii) tracking where theuser is pointing; and (iii) detecting salient events such as a buttontouch from the pointing trajectory and gestures of the user. This may beperformed, for example, by detecting an occlusion of the projected imageover a certain zone, such as, for example, an icon or pull down menu.This information is then provided to the computer, which then reconcilessuch gesture with a look-up table, for example.

FIGS. 6 a and 6 b show a representative look-up table according to anembodiment of the invention. Specifically, it is shown that many complexcommands can be executed using gestures such as, for example, a singleleft click of the mouse by the user moving his or her hand in aclockwise rotation. Other gestures are also contemplated by theinvention such as those shown in the look-up tables of FIGS. 6 a and 6b. It should be understood, though, that the gestures shown in FIGS. 6 aand 6 b should be considered merely illustrative examples.

As a further example, the invention further contemplates that a complexcommand can be executed based on a combination of movements by two (ormore) objects, such as, for example, both of the user's hands. In thisembodiment, the system and method of the invention would attempt toreconcile and/or verify a motion (gesture) of each object, e.g., bothhands, using the look-up table of FIGS, 6 a and 6 b, for example. Ifboth of the motions cannot be independently verified in the look-uptable, for example, the system and method would attempt to reconcileand/or verify both of the motions using a look-up table populated withactions associated with combination motions. By way of one illustration,an “S” motion of both hands, which are motions not recognized,independently, may be a gesture for taking an action such as, requestinginsertion of a “watermark” in a word processing application. It shouldbe recognized by those of skill in the art that all actions, whether fora single motion or combination of motions, etc. may be populated in asingle look-up table or multiple look-up tables, without anylimitations.

FIG. 7 shows a template which may be implemented in embodiments of theinvention. Even though the appearance of an object will change as itmoves across the projected image, it will create a region of changedpixels that retains the basic shape of the moving object. To findpointing fingertips, for example, each video frame is subtracted fromthe frame before it, removing noise with simple computationalmorphology, and then convolving a fingertip template “T” over thedifference image using a matching function. If the template “T” does notmatch well in the image, it can be assumed that the user is not pointingor gesturing.

The fingertip template of FIG. 7 is kept short, in embodiments, so thatit will match fingertips that extend only slightly beyond theirneighbors and will match fingertips within a wider range of angles. As aresult, the template often matches well at several points in the image.It should be readily understood that other templates may also be usedwith the invention such as, for example, pointers and other objects.These templates may also be used for implementing and recognizing thegestures, referring back to FIGS. 6 a and 6 b.

FIG. 8 is a swim lane diagram showing steps of an embodiment of theinvention. “Swim lane” diagrams may be used to show the relationshipbetween the various “components” in the processes and to define thesteps involved in the processes. FIG. 8 may equally represent ahigh-level block diagram of components of the invention implementing thesteps thereof. The steps of FIG. 8 may be implemented on computerprogram code in combination with the appropriate hardware. This computerprogram code may be stored on storage media such as a diskette, harddisk, CD-ROM, DVD-ROM or tape, as well as a memory storage device orcollection of memory storage devices such as read-only memory (ROM) orrandom access memory (RAM). Additionally, the computer program code canbe transferred to a workstation over the Internet or some other type ofnetwork. The steps of FIG. 8 may also be implemented by the embodimentof FIG. 1.

In particular, FIG. 8 shows a process flow diagram, describing ascenario in which a user performs a series of actions using the gesturebased user interaction grammar provided herein. At step 800, a userapproaches a wall or other surface on which the system of the inventionhas projected a User Interface (UI) for a given application. At step805, the user is recognized by the camera as they disturb the field ofvision for the camera. At step 810, the user opts to open a menu from anicon in the UI via a right-click action. In embodiments, the userselects the icon with dominant hand (e.g., left hand). In one example,button touches are detected by examining the hand trajectory for severalspecific patterns that indicate this type of motion.

At step 815, the camera recognizes the disturbance of the “hotspot”(zone) associated to the selected icon, and calls the system to validatethat the shape of the disturbance is identified in the template. At step820, a determination is made to establish if the shape of the disturbingobject is a valid shape in the template. If not, then at step 825, noaction is taken; however, as described above, in embodiments, the systemmay recognize a second disturbance or gesture, at which time the systemwill make a determination that the combination of the first and secondmotions (e.g., disturbances) are a unique, valid gesture for an actionto be taken.

If a valid shape is found at step 825, then the system displays theselected state of the selected icon at step 830. In an alternativeembodiment, the system may recognize two gesture simultaneously, atwhich time, the system will make a determination as to whether thecombination of gestures is associated with an action. If so, anappropriate action will be taken. This same or similar processing maycontinue with other examples.

At step 835, after successful display of the selected state of the icon,at step 830, the user uses the non-dominant hand (e.g., right hand) toarticulate the gesture associated to a “right-click” action, for example(counter-clockwise rotation, see look-up table of FIGS. 6 a and 6 b). Atstep 840, the camera recognizes the articulation of the gesture, and atstep 845, the system performs lookup to validate that the gestureresides in the system and is associated to an action.

At step 850, a determination is made as to whether the gesture isassociated to an action. If there is no associated action, the systemwill revert to step 825 and take no action. If there is an associatedaction, at step 855, the system will execute the action (e.g., displayopen menu). Thus, after the system successfully identifies thearticulated gesture, the system displays the appropriate action (e.g.,opening a menu associated to the initially selected icon).

At step 860, the user selects from one of “X” number of possiblenavigational menu options. At step 865, the camera recognizes thedisturbance of the hotspot (interaction zone) associated to the selectedmenu item, and calls to validate that the shape of the disturbance isidentified in the template. At step 870, a determination is made as towhether the shape of the disturbing object is a valid shape in thetemplate. If not recognized, then the system reverts back to step 825and takes no action. If the gesture is valid (recognized), then at step875, the system displays the selected state of the selected menu item.

At step 880, after successful display of the selected state of the menuitem, the user uses the non-dominant hand, for example, to articulatethe gesture associated to a “single left-click” action (single clockwiserotation, see look-up table of FIGS. 6 a and 6 b). At steps 885 and 890,the camera recognizes the articulation of the gesture, and the systemperforms lookup to validate that the gesture resides in the system andis associated to an action.

At step 895, the system makes a determination if the gesture isassociated with an action. If not, the system again reverts back to step825. If there is an associated action, at step 900, the system executethe associated action (navigate user to associated screen in the UI, inthis case). The process then ends at “E”.

In a more generalized embodiment, a user points to a particular zonewithin the display area, e.g., a certain application. The system of theinvention would recognize such action by the methods noted above. Inembodiments, once the system recognizes the user within a zone andverifies that this is the proper zone, the system would “lock” thatselection. Once locked, the user can then provide a gesture such as, forexample, an “e” shape to exit the application, which will then verifiedand executed by the system of the invention.

While the invention has been described in terms of embodiments, thoseskilled in the art will recognize that the invention can be practicedwith modifications and in the spirit and scope of the appended claims.

1. A method, comprising: recognizing with a camera a disturbance of adisturbing object in a display zone of a projected image; making adetermination as to whether a shape of the disturbing object is a validshape in a template; displaying a selected state of a selected item whenthe disturbing object is a valid shape; recognizing with the camera amovement comprising a combination of two substantially simultaneousactions, including a first action of the disturbing object and a secondaction of an object in an interaction zone between the camera and theprojected image; validating the movement comprising the combination ofthe first action and the second action by comparing the first action andthe second action to a plurality of predefined gestures; and executing acommand based on the validating of the combination of the first actionand the second action.
 2. The method of claim 1, wherein the displayzone is one or more areas on the image.
 3. The method of claim 1,wherein the executed command is a menu driven command.
 4. The method ofclaim 1, wherein the predefined combination of two substantiallysimultaneous actions includes a trajectory for defined patterns thatindicate at least one of a type and place of motion.
 5. The method ofclaim 1, wherein when the disturbing object is a valid shape, theselected state of a selected icon or menu item in a user interface ofthe projected image is displayed.
 6. The method of claim 5, wherein alook-up table associates the plurality of predefined gestures with arespective plurality of commands, the plurality of pre-defined gesturescomprising: a single left click gesture that executes a single leftmouse click selection; a double left click gesture that executes doubleleft mouse click selection; a single right click gesture that executes asingle right mouse click selection; a copy gesture that executes a copycommand; a paste gesture that executes a paste command; an undo gesturethat executes an undo command; and a redo gesture that executes a redocommand.
 7. The method of claim 6, wherein the plurality of pre-definedgestures further comprise: a next gesture that takes a user to a nextentry in a list; a previous gesture that takes the user to a previousentry in the list; a first entry gesture that takes the user to a firstentry in the list; a last entry gesture that takes the user to a lastentry in the list; a home gesture that takes the user to a home of anapplication; and an exit gesture that executes an exit command andcloses the application.
 8. The method of claim 1, further comprising,before recognizing the disturbance in the display zone, recognizing auser entering a field of vision of a camera; and determining theproximity of the user to the user interface.
 9. A system comprising aserver having a database containing data associated with at least one ormore predefined gestures, and at least one of a hardware and softwarecomponent for executing an action based on the at least one or morepredefined gestures, the hardware and software: comparing a first actionof a disturbing object, detected by a camera, in an interaction zonebetween the camera and a projected image to a predefined template of ashape; recognizing a gesture detected by the camera, which interrupts alight source between the camera and the projected image, the gesturecomprising a combination of two substantially simultaneous hand motions;determining that the gesture corresponds to a predefined command of theuser interface; and executing the command corresponding to the gesture.10. The system of claim 9, wherein the system includes an architecturecomprising a services layer, an integration layer and an applicationlayer.
 11. The system of claim 10, wherein the services layer comprisesat least: a vision interface module responsible for recognizing thegesture and converting an appropriate action to an application; ageometric reasoning module which enables automatic selection of adisplay based on proximity of the interaction zone to a user andnon-occlusion of the interaction zone by the user or other object; and auser localization module to determine a position of the user.
 12. Thesystem of claim 11, wherein one of the geometric reasoning module anduser localization module recognize the first action and the secondaction.
 13. The system of claim 9, wherein the at least one of ahardware and software component resides on a server provided by aservice provider.
 14. A computer program product comprising a computerusable storage medium having readable program code embodied in thestorage medium, the computer program product includes at least onecomponent to: recognize a disturbance caused by a first finger of afirst hand of the user in a display zone of a user interface imageprojected on a surface in the field of vision of the camera; display aselected state in response to the recognized disturbance; recognize agesture interrupting a light source and associated with an action to betaken on or associated with the displayed selected state, the gesturecomprising a combination of two substantially simultaneous motions bythe first hand of the user and a second hand of the user; determine thatthe gesture corresponds to a command of the user interface based on oneor more look-up tables populated with a plurality of pre-defined handmotions and a respective plurality of commands of the user interface;and execute the associated command of the user interface in response tothe gesture.
 15. The computer program product of claim 14, wherein therecognized gesture includes a motion of a non-tethered object betweenthe source and the projected image.
 16. The computer program product ofclaim 14, wherein the display zone is one or more areas on the projectedimage which are associated with an application program.
 17. The computerprogram product of claim 14, wherein when the disturbance is a validshape, the selected state of a selected item is displayed.
 18. Thecomputer program product of claim 14, wherein the executed function is amenu driven command.